SparCE: Sparsity aware General Purpose Core Extensions to Accelerate Deep Neural Networks
نویسندگان
چکیده
Deep Neural Networks (DNNs) have emerged as the method of choice for solving a wide range of machine learning tasks. Satiating the enormous growth in computational demand posed by DNNs is a key challenge for computing system designers and has most commonly been addressed through the design of custom accelerators. However, these specialized accelerators that utilize large quantities of multiply-accumulate units and on-chip memory are prohibitive in many design scenarios (e.g., wearable devices and IoT sensors), due to the stringent area/cost constraints. Therefore, accelerating DNNs on these low-power systems, comprising of mainly the indispensable general-purpose processor (GPP) cores, requires new approaches. In this work, we focus on improving the performance of DNNs on GPPs by exploiting a key attribute of DNNs, i.e. sparsity. We propose Sparsity aware Core Extensions (SPARCE) a set of micro-architectural and ISA extensions that leverage sparsity and are minimally intrusive and low-overhead. We address the key challenges associated with exploiting sparsity in GPPs, viz., dynamically detecting whether an operand (e.g., the result of a load instruction) is zero and subsequently skipping a set of future instructions that use it. To maximize performance benefits, our design ensures that the instructions to be skipped are prevented from even being fetched, as squashing instructions comes with a penalty (e.g., a pipeline stall). SPARCE consists of 2 key micro-architectural enhancements. First, a Sparsity Register File (SpRF) is utilized to track registers that are zero. Next, a Sparsity aware Skip Address (SASA) table is used to indicate instruction sequences that can be skipped, and to associate specific SpRF registers to trigger instruction skipping. When an instruction is fetched, SPARCE dynamically pre-identifies whether the following instruction(s) can be skipped, and if so appropriately modifies the program counter, thereby skipping the redundant instructions and improving performance. We model SPARCE using the gem5 architectural simulator, and evaluate our approach on 6 state-of-the-art image-recognition DNNs in the context of both training and inference using the Caffe deep learning framework. On a scalar microprocessor, SPARCE achieves 1.11×-1.96× speedups across both convolution and fully-connected layers with 10%-90% sparsity, and 19%-31% reduction in execution time at the overall application-level. We also evaluate SPARCE on a 4-way SIMD ARMv8 processor using the OpenBLAS library, and demonstrate that SPARCE achieves 8%-15% reduction in the application-level execution time.
منابع مشابه
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units
Rectifier neuron units (ReLUs) have been widely used in deep convolutional networks. An ReLU converts negative values to zeros, and does not change positive values, which leads to a high sparsity of neurons. In this work, we first examine the sparsity of the outputs of ReLUs in some popular deep convolutional architectures. And then we use the sparsity property of ReLUs to accelerate the calcul...
متن کاملLearning Structured Sparsity in Deep Neural Networks
High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) ob...
متن کاملCombined Group and Exclusive Sparsity for Deep Neural Networks
The number of parameters in a deep neural network is usually very large, which helps with its learning capacity but also hinders its scalability and practicality due to memory/time inefficiency and overfitting. To resolve this issue, we propose a sparsity regularization method that exploits both positive and negative correlations among the features to enforce the network to be sparse, and at th...
متن کاملOn optimization methods for deep learning
The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limite...
متن کاملA Reconfigurable Low Power High Throughput Streaming Architecture for Big Data Processing
—General purpose computing systems are used for a large variety of applications. Extensive supports for flexibility in these systems limit their energy efficiencies. Neural networks, including deep networks, are widely used for signal processing and pattern recognition applications. In this paper we propose a multicore architecture for deep neural network based processing. Memristor crossbars a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1711.06315 شماره
صفحات -
تاریخ انتشار 2017